Speeding up correlation search for binary data
نویسندگان
چکیده
Finding the most interesting correlations in a collection of items is essential for problems in many commercial, medical, and scientific domains. Much previous research focuses on finding correlated pairs instead of correlated itemsets in which all items are correlated with each other. Though some existing methods find correlated itemsets of any size, they suffer from both efficiency and effectiveness problems in large datasets. In our previous paper [10], we propose a fully-correlated itemset (FCI) framework to decouple the correlation measure from the need for efficient search. By wrapping the desired measure in our FCI framework, we take advantage of the desired measure’s superiority in evaluating itemsets, eliminate itemsets with irrelevant items, and achieve good computational performance. However, FCIs must start pruning from 2-itemsets unlike frequent itemsets which can start the pruning from 1-itemsets. When the number of items in a given dataset is large and the support of all the pairs cannot be loaded into the memory, the IO cost O(n2) for calculating correlation of all the pairs is very high. In addition, users usually need to try different correlation thresholds and the cost of processing the Apriori procedure each time for a different threshold is very high. Consequently, we propose two techniques to solve the efficiency problem in this paper. With respect to correlated pair search, we identify a 1-dimensional monotone property of the upper bound of any good correlation measure, and different 2-dimensional monotone properties for different types of correlation measures. We can either use the 2-dimensional search algorithm to retrieve correlated pairs above a certain threshold, or our new Token-Ring algorithm to find top-k correlated pairs to prune many pairs without computing their correlations. In addition, in order to speed up FCI search, we build an enumeration tree to save the fully-correlated value (FCV) for all the FCIs under an initial threshold. We can either efficiently retrieve the desired FCIs for any given threshold above the initial threshold or incrementally grow the tree if the given threshold is below the initial threshold.
منابع مشابه
Fast Algorithms for Computing Binary Correlation
Cross-correlation is widely used to match images. Cross-correlation of windows where pixels have binary values is necessary when thresholded sign-of-laplacian images are matched. Nishihara proposed that the sign of the laplacian of an image be used as a characteristic, that is robust to illumination changes and noise, to match images. Thresholding the sign of the laplacian of an image results i...
متن کاملIndex Search Algorithms for Databases and Modern CPUs
Over the years, many different indexing techniques and search algorithms have been proposed, including CSS-trees, CSB+-trees, k-ary binary search, and fast architecture sensitive tree search. There have also been papers on how best to set the many different parameters of these index structures, such as the node size of CSB+-trees. These indices have been proposed because CPU speeds have been in...
متن کاملTransition Trees for Cost-Optimal Symbolic Planning
Symbolic search with binary decision diagrams (BDDs) often saves huge amounts of memory and computation time. In this paper we propose two general techniques based on transition relation trees to advance BDD search by refining the image operator to compute the set of successors. First, the conjunction tree selects the set of applicable actions through filtering their precondition. Then, the dis...
متن کاملPredictors of speeding among drivers based on Prototype Willingness Model
Background: Every year 1.2 millions of people are killed in road accident, and speeding is a major contributor road crashes among young driver. Accounting 40% of fatal crashes involved speeding. The purpose of this study was determining predictor of speeding intention among young driver 19-25 years old young driver in ghaemshahr based on Prototype Willingness Model. Materials and methods: I...
متن کاملSpeeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method
In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 34 شماره
صفحات -
تاریخ انتشار 2013